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We confirm universal behaviors such as eigenvalue distribution and spacings predicted by Random 
Matrix Theory (RMT) for the cross correlation matrix of the daily stock prices of Tokyo Stock 
Exchange from 1993 to 2001, which have been reported for New York Stock Exchange in previous 
studies. It is shown that the random part of the eigenvalue distribution of the cross correlation matrix 
is stable even when deterministic correlations are present. Some deviations in the small eigenvalue 
statistics outside the bounds of the universality class of RMT are not completely explained with 
the deterministic correlations as proposed in previous studies. We study the effect of randomness 
on deterministic correlations and find that randomness causes a repulsion between deterministic 
eigenvalues and the random eigenvalues. This is interpreted as a reminiscent of "level repulsion" 
in RMT and explains some deviations from the previous studies observed in the market data. We 
also study correlated groups of issues in these markets and propose a refined method to identify 
correlated groups based on RMT. Some characteristic differences between properties of Tokyo Stock 
Exchange and New York Stock Exchange are found. 

PACS numbers: 5.40.Fb, 89.65 



I. INTRODUCTION 

The price changes of securities such as stocks involve various economic backgrounds as well as interaction 
between securities. They seem to be quite complicated. Conventionally financial economists model the 
price changes of securities by stochastic processes {random walks) 1]. It is a basic ingredient of modern 
portfolio theory Although the use of stochastic processes is common in finance, the validity of such a 
formulation should be empirically tested e.g. by statistical properties of the markets, since the underlying 
crgodic property of a market may be hard to be established. 

Recently the statistical characterizations of financial markets based on physics concepts and methods 
attract considerable attentions Given that a stochastic model is valid, some statistical properties of the 
market should be derived as outsets of stochasticity. For example, the cross correlation matrix among N 
securities can be regarded as a random matrix and it may be legitimate to expect that it shares universal 
properties of a corresponding ensemble of Random Matrix Theory (RMT) in an appropriate large TV-limit 
(since N is usually large). This has been confirmed by several studies on actual stock markets The 
bulk of the eigenvalue distribution of the cross correlation matrix of a major index (S&P500) of New York 
Stock Exchange (NYSE) is found to follow the eigenvalue distribution of the Wishart matrix 0] , which is 
a random correlation matrix constructed from mutually uncorrelated time series ■ Also the eigenvalue 
spacing statistics are found to follow those of the Gaussian Orthogonal Ensemble (GOE) |{|. 

The aim of this paper is to yield further supports on the applicability of RMT to analysis of stock markets. 

In Sec[nl we give a brief review on the relevant results of RMT. We describe our data sample in Scc lIIII 
In Sec llVI we test predictions of RMT for the cross-correlation matrix for the daily prices of the issues in 
Tokyo Stock Exchange (TSE) from 1993 to 2001. The quantities we calculated are the distribution of the 
eigenvalues, the nearest and next-nearest neighbor spacings, rigidity and a certain moment of eigenvector 
components. We find a good agreement with the real data within the RMT bounds for the eigenvalues. Indeed 
there are clear deviations outside the bounds which indicate the presence of deterministic correlations among 
issues. In Sec0 we consider random walks with deterministic correlations and show that the bulk part of 
the eigenvalue distribution of the correlation matrix is stable. In Sec lVIl we closely examine the distribution 
of the moment of eigenvector component. Eigenvectors corresponding to the eigenvalues outside the RMT 
bounds deviate from the RMT prediction. According to Ref. |6(, the deviating eigenvalues at the lower 
edge are a consequence of the strong correlations among a few issues. However, we find that the observed 
data is not explained quantitatively by this reasoning alone. Therefore we analyze the effect of randomness 
on deterministic correlations between issues and find an interplay between deterministic correlations and 
randomness. We argue that it gives a refined explanation on the deviations. In Sec I VIII we identify groups 
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of strongly correlated issues from the information of the non-random eigenvectors. The ways of grouping in 
TSE and NYSE show some differences. 



II. BRIEF REVIEW ON RANDOM MATRIX THEORY 
A. Wishart Matrix 

Let Si(i) be a price at time t of a stock labeled by i (i = 1, 2, • • • , N, t = 1, 2, • • • , T). The change of price 
at time t can be measured by 

Gi(t) = In Si(t + 1)- In Si(t). (1) 

Here we take logarithm of the prices because the fluctuation of stock prices is typically given by the geometric 
Brown motion. Since 

Gi(t) " ftfl ( } 

Gj(t) is approximately the return of the issue i from t to t + 1. We also define the normalized return §i(t) 
as follows. 

9i {t) = frw-^'K (3) 

(• • -)t indicates the time series average of T steps and the dispersion a is given by 

* = \J {GJ)t - (4) 
Then the correlation matrix C is expressed in terms of gi(t) 

dj = (9i9j)T- (5) 

C is a real symmetric matrix with positive eigenvalues. 

We will model the price of stocks as a stochastic process (random walk). For N random walks £,(i), (i = 
1, 2, • • • , N), a matrix M which is defined by Ma — Xi(t) is a T x N matrix. The cross correlation matrix 
W is defined as follows 

Wij = (xiXj) T = ^M*M, (6) 
where M* is the transposition of M. A purely random case with a uniform dispersion a is given by 

(*,•(*))= 0, (7) 
(xi(t)xj(T)) = a 2 S tj S tT - (8) 

Here (• ■■) indicates the average over the random variable phase space. In this case, W is called the Wishart 
matrix 0, • We can include "true" correlations among issues by replacing in (JSJ by a non-diagonal 
matrix C. We will call C as deterministic correlation while we call C or W as cross correlation. 

B. Eigenvalue Statistics of Random Matrices 

Let us summarize the relevant results of RMT to which we will refer in this paper. 

In the limit N — > oo, T — > oo with Q = T/N fixed, the eigenvalue distribution p(X) for the Wishart matrix 
becomes 



Q V (A maa ; — A)(A — A m i„) 

PW =2^ A (9) 



X^: = <7 2 (l + ^±2 ] j^) (10) 
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@ is exact at N — > oo, T — > oo with Q = T/N = const. It is approximately valid at finite N and T when 
N and T are not small. According to © l]lUp. the eigenvalues of the Wishart matrix distribute only in the 
range (A m j n ,A max ). 

Next we consider the Gaussian ensembles of random matrices. In the Gaussian ensembles, the probability 
of a matrix H to be in the infinitesimal volume element dH (dH is given by the product of infinitesimal of 
independent elements) is given by P(H)dH where P(H) 

P(tf)=Acxp(- a ]T|A 4 | 2 ). (11) 

i 

Here a is a parameter which characterizes the ensemble, Xi is the eigenvalue of H and A is the normalization 
constant. For general ensembles, one replaces the term Yli |Ai| 2 by J^V(Ai) with a function V(A). For 
example, one can add the quartic or higher order terms, but it is known that, in the large iV-limit (N is 
the size of H), the model flows to the Gaussian model 10]. The Gaussian models are classified by the 
symmetry of the matrix as i) Gaussian Orthogonal Ensemble (GOE), the ensemble invariant under the 
orthogonal group, ii) Gaussiann Symplectic Ensemble (GSE), the ensemble invariant under the symplectic 
group, and iii) Gaussiann Unitary Ensemble (GUE), the ensemble invariant under the unitary group. Since 
the correlation matrix C is real symmetric, the ensemble relevant to our analysis is GOE. For GOE, the 
volume element dH is given by 

dH = Y[dHij. (12) 

i<j 

To obtain the statistical measure of the eigenvalue distribution P(Ai, Aa, • • • , Ajv), one expresses H as the 
product of the diagonal matrix with eigenvalue entries and the other variables, and then integrates the other 
variables. In this way, we get the measure 

IIlA.-A.f JpAfc. (13) 

i<j k 

Here = 1 for GOE, (3 = 2 for GUE and j3 = 4 for GSE. Thus the eigenvalue distribution for a Gaussian 
ensemble is determined by (3. By this way, we get the eigenvalue distribution for a general potential V as 
follows. 



N 

P(Ai, A 2 , ■ • • , Ajv) = A' exp[-/3( V - V In |A, - \ 3 1)], (14) 



k=l H i<j 



where A' is the normalization constant. From l|14fl . one sees that the statistical properties at the short 
spacing between eigenvalues are dominated by — ln|Aj — Xj\ and the total potential is negligible. Thus j3 
determines the eigenvalue spacing at short distance. For each (3, the level spacing has been closely studied 
|ll|. As the correlation matrix is real symmetric, we expect that its statistical properties of the eigenvalue 
spacing are given by (3 = 1. One can characterize the statistical properties of eigenvalue spacing by the 
nearest neighbor spacing P nn , the next-nearest neighbor spacing P nnn , and the "rigidity" A(L). P nn and 
P nnn are for short-range correlations while A(L) is for long-range correlations. A(L) is defined as 



where F(X) is given by 



A(L)^ z (min/ (F(A') - AX' BfdX' ) , (15) 



P(A)=J>(A-A fc ) (16) 



with the Heaviside function 9. F(X) counts the number of eigenvalues below A. The meaning of A(L) is 
that one fits F(X) by a line in an interval with a width L around each eigenvalue, and take the average of 
the deviations of the fit. A(L) is small when the eigenvalue spacing has a uniform distribution. 
For GOE, P nni P nnn and A(L) are given by [nj, 

Pnn(s) = ^exp(-J S 2 ) (17) 
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.(«) = ^3* 4 -P(-|b s2) (18) 



ML) = —L~ 4 [ du(L-u) 3 (2L 2 -9Lu-3 
15 Jo 



-3u 2 ) 



x (l6(u)-Y(u)) (19) 



is called 2-spectral cluster function given by 



Y{u) = (M^L) + J± ( S J^M) r s ^ dt . (20) 



According to RMT, the distribution of components of an eigenvector of GOE is the normal distribution 
with mean and dispersion TV. A useful quantity in characterizing the distribution of components is the 
Inverse Participation Ratio (IPR) 0,0]- For each eigenvector u&, IPR is defined by the following formula. 



N 



(21) 

i=l 

where Uki is the i-th component of Uk- For example, let us consider the case Uki is 1/y/L for 1 < i < L 
and for the other i's. This gives Ik = l/L. Thus IPR can be interpreted as the inverse of the number of 
components which differ from zero significantly. In RMT, the expectation value of IPR is 



III. MARKET DATA 



The data we analyzed are daily stock prices of i) Tokyo Stock Exchange (TSE) from 1993 January to 2001 
June and ii) S&P 500 index of New York Stock Exchange (NYSE) from 1991 January to 2001 July. As for 
S&P, the daily price data for a different period has been analyzed by Laloux et al. [4|. Also the 30 minute 
price data for NYSE has been studied by Plerou et al. 0, @ • ^ n ^ cie TSE data, the number of data points 
(the days that the market is open) is 1848. We analyze, among all the issues in TSE, the 493 issues which 
are traded in all of the 1848 days. We select the data of these issues and analyze it. For this data, N = 493 
and T = 1848. In the S&P500 data, the number of data points is 2599. We select the issues which have 
been selected in S&P500 index before 1991 and analyze their prices. They are amount to 297. For this data, 
N = 297 and T = 2598. 



IV. UNIVERSAL RANDOM PROPERTIES OF CROSS CORRELATIONS IN STOCK 

MARKETS 

In Refs. 0, 01 , the cross correlation matrices of NYSE data are analyzed and found that they exhibit 
remarkable agreement with the predictions of universality properties of RMT for the small eigenvalues' 
distribution, their nearest and next-nearest neighbor spacings, rigidity and IPR. In this section, we perform 
a similar analysis on the TSE data and confirm these properties. We also use the S&P data for comparison. 

We diagonalize the correlation matrices of TSE and S&P, to obtain the eigenvalues and the eigenvectors Uk 
(k = 1, • • • , N). k is smaller for a large eigenvalue. For TSE, a 2 = 1 and Q = N/T = 3.75 give \ m i n = 0.23 
and X m ax = 2.30, also for S&P, Q — 8.75 gives Xrnm = 0-43 and X ma x = 1-79. We fit the distributions by 
optimizing a 2 smaller than 1, as discussed in Ref. |4|. FigQ] shows the eigenvalue distribution for TSE. We 
see that the small eigenvalue distribution of the correlation matrix of TSE is well reproduced by RMT. There 
are large eigenvalues beyond the bound [A m i n , A max ] predicted by the Wishart matrix. The largest eigenvalue 
we obtain is 121.6 (52.2) for TSE (S&P) and is interpreted as the factor for market trend as readily verified 
by examining the corresponding eigenvector. The multitude of this factor to the price changes of individual 
stocks is given by Xi/N, which is 0.247 (0.176) for TSE (S&P). Thus TSE is more correlated with the trend 
factor than S&P. 
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eigenvalue 

FIG. 1: The figure shows the eigenvalue distribution for the correlation matrix of TSE. The line in each figure is 
for the real data and the dotted line is for the Wishart matrix. We use ||UJ multiplied by N' /N for fitting where 
N' is the number of eigenvalues within [A min , \ max \. a 2 is fitted to the optimized value by the least square method. 
a 2 = 0.47(0.53) for TSE (S&P). For TSE (S&P), a Kolmogorov-Smirnov test in the fitted region cannot reject the 
hypothesis that the RMT prediction is the correct description at the 30% (60%) confidence level. 




0.5 1 1.5 2 2.5 3 3.5 4 
nearest neighbor spacing s 




0.5 1 1.5 2 2.5 3 3.5 4 
next-nearest neighbor spacing s 



FIG. 2: The figures are the nearest and the next-nearest neighbor spacing distribution for TSE compared to the 
prediction of RMT indicated by the dotted line. A Kolmogorov-Smirnov test cannot reject the hypothesis that the 
GOE prediction is the correct description at the 30% (80%) confidence level for the nearest neighbor spacing for TSE 
(S&P), at the 80% (60%) confidence level for the next-nearest neighbor spacing for TSE (S&P). 
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FIG. 3: The plus mark is the rigidity A(L) for TSE while the x mark is the rigidity for S&P. The line is the 
prediction of RMT. A Kolmogorov-Smirnov test cannot reject the hypothesis that the GOE prediction is the correct 
description at the 80% confidence level both for TSE and S&P. 

Next we compare spacings of the nearest neighbor and the next-nearest neighbor eigenvalues, and the 
rigidity with the predictions of RMT. To examine the statistics of the eigenvalue spacing, we first do the 
"unfolding" transformation on the data. The "unfolding" transformation is described in 6]. After doing 
the "unfolding" transformation on the eigenvalues below A max , we compare their nearest-neighbor and next- 
nearest neighbor spacing distributions to the ones for GOE. The theoretical predictions for the nearest 
neighbor spacing and the next-nearest neighbor spacing are given in l|17|) and (|18f) respectively. We show in 
Fig- El the spacings of small eigenvalues for TSE. It shows a good agreement with the prediction of RMT. 
For the rigidity A(L), the theoretical prediction is given in eq. (|19fl . The rigidity of the eigenvalues of the 
cross correlation matrix for TSE below A max is compared to RMT in Fig|3J FigEJ shows that the rigidity 
agrees well with the prediction of RMT. 

In FigQ] we plot the calculated IPR for the eigenvectors of the cross correlation matrix of TSE. One sees 
that IPR agrees with the prediction of RMT around 1. There are also eigenvectors whose IPR are larger than 
the RMT prediction. These eigenvalues are from deterministic correlations. As in FigQ] such deviations can 
be seen at the large eigenvalues. However one also sees that there is a deviation in small eigenvalues. This 
deviation is concentrated at the lower edge. A simple model was constructed by Plerou et al. 6]. We will 
study this deviation closely in Sec I VII 

As mentioned, we also performed the same analysis on the S&P data for comparison. Results for rigidity 
and IPR are shown in figs. |31andQ] We found that the conclusions of Plerou et al. 0, E| for 30 minutes 
data of NYSE on eigenvalue spacings also hold for our daily S&P data. 
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FIG. 4: The upper two figures are IPR for TSE and S&P. The lower two figures are IPR for TSE and S&P at small 
eigenvalues. The dotted lines are the prediction of RMT. 



V. STABILITY OF EIGENVALUE DISTRIBUTION OF THE WISHART MATRIX IN THE 
PRESENCE OF DETERMINISTIC CORRELATIONS 

In the previous section, we found that the small eigenvalue distributions of the cross correlation matrices 
of TSE and S&P are reproduced well by the ones of the Wishart matrix, as previously found in Ref. [J] . The 
Wishart matrix is generated by the random walks without any deterministic correlations while the real stock 
data has a distribution of large eigenvalues, showing a deviation from the Wishart matrix. This indicates 
the existence of deterministic correlations. 

Thus, in this section, we examine the stability of the random eigenvalue distribution of the cross correlation 
matrix W of random walks when one includes deterministic correlations. 

Let us consider a set of random walks whose deterministic correlation matrix has a finite number of large 
eigenvalues and other eigenvalues are small. We assume that Tx N matrix {M ti = Xi(t)} has a deterministic 
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correlation of the form 

(M H ) = 0, (23) 

(M ti M Tj ) = D tT &ij. (24) 

The cross correlation matrix at step T is given by M t M. As in RMT, the eigenvalue distribution of M*M 
is calculated from the Green function, 

by the formula 



p(A) = —5— lim Im[TrG(A - ie) - TrG(A + ie)]. (26) 

The present case was studied in Ref. Using the replica method, a Dyson- type equation for G was obtained 
at N, T -> oo with Q = T/N fixed as follows 

G(A) = r . (27) 



A-GTr 



l-DTr(CG(A)) 

© is readily obtained by putting G = er 2 l, D = 1/T and taking the trace of lf7T|l 

TrG(A) = N - (28) 

A — a z ^ 

l-^TrG(A) 

Solving this second-order algebraic equation for TrG(A) and putting the solution to f2"fi|l yields © and lfTT))l . 

Now we assume that C has L large eigenvalues A^ (k = 1, 2, • • • , L) and the other N — L eigenvalues 

A^(fc = L + 1, • • • , N). We set A^(fc = L + 1, • • • , N) to be a same value A^. Since the trace of the cross 
correlation matrix equals N by definition, we have 



We also assume no temporal correlations thus set D = 1/T. 
From JSTJ, the eigenvalues Aj?(A) of G(A) are given by 



(29) 



(A) ~ . (30) 

A- AF- 



l-±(Tr s CG+Tr L CG) 



Let us split the trace as Tr = Tr^ + Trg. Here Tr^ and Trg are the trace over the eigenspace spanned by the 
eigenvectors for Aj?(fc = 1, • • • L), Ajf(fc = L + 1, • • • N) respectively. Summation over k = L + 1, • • • , N gives 

Tr s A£(A) = - A g N ~ L } — (31) 

8 l-±(Tr B CG+Tr L CG) 

For N large, p(X) should have finite supports around Aj? in the real axis of A. We denote supports for large 
and small eigenvalues Ds and Dl- We assume that the case 

A? «A£,(fc = l, •••£), (32) 

when D5 and Dl don't have an overlap. In that case, A£? (k = 1,---,L) is analytic in Ds while A)? 
(fc = L + 1, • • • , AT) has a branch cut. Thus in D$, p(A) is determined by the imaginary part of TrgG. For 
TrgG, the contribution from Ajf (fc = 1, • • • , L) comes from the right hand of (|31l) . Since A^(fc = 1, • • • L) is 
analytic in the neighborhood of Dg, Tr^G is bounded by a constant. Since Ajf is an algebraic function of 
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N and the scaling behavior consistent with 1|3U|I is O(l), the constant can be taken to be independent of N. 
Thus if for k = X,---,L 

LX d k « NX?, (33) 

then 

Tr 5 CG = Af Tr s G > Tr L GG 
for N large because TrgG gets large as N — ► oo. Then (|31|) is approximated by 

N — L 

Tr s G(A) = « . (34) 

A — A^ — 

l-^-TrsG(A) 

iPH is equal to |2Hl when a 2 = X^ and N is replaced by TV — L. By putting the solution of to ffify. we 

get 



m iV-L Q A /(A max -A)(A-A min ) 

This formula is valid under (|32|l and H33|) . Note that there is a trade-off between N,L, Xf , Ap(fe = 1, ■ • • , L) 
under ll-il'l) and l|33|) . Thus N — L eigenvalue distribution of this model can be approximated by the one for 
the Wishart matrix. 

To conclude, the distribution of the small eigenvalues remains the same in the ./V — > oo, as long as the 
numbers of the large eigenvalues of the deterministic correlation G is finite and they appear only outside of 
D s . 

To confirm the validity of the approximation, we performed a Monte-Carlo simulation with 6 large eigen- 
values. We choose the large eigenvalues to be 121.6, 14.5, 11.4, 7.9, 4.7, 4.0 which are the observed large 
eigenvalues of TSE. The result is shown in Fig|S| We see that the large eigenvalues correspond to the large 
eigenvalues of the real correlation matrix while the small eigenvalue distribution is well reproduced by the 
one for the Wishart matrix. We also examined other values of large eigenvalues and obtained similar results. 
Moreover, the probability of observed eigenvalues has a finite width by the effect of randomness. The width 
of an observed eigenvalue is wider for a larger eigenvalue. 
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FIG. 5: The line is for the model with large eigenvalues of the real correlation matrix while the dotted line is for @ 

with a 2 = Af in I29I I. The small eigenvalue distribution (the upper graph) is very close. The middle graphs are the 
large eigenvalue distribution. We take the large eigenvalues of the real correlation matrix as 121.6, 14.5, 11.4, 7.9, 
4.7, 4.0 which are found for TSE. The eigenvalues are observed in neighborhood of these values. Also, the observed 
eigenvalues have a finite width by the effect of randomness. The width of the observed eigenvalues is wider for the 
larger eigenvalues. 



VI. LEVEL REPULSION OF DETERMINISTIC CORRELATIONS BY RANDOMNESS 



According to Plerou et al. p( , the deviation at small eigenvalues arises from strong correlations among a 
small number of issues. This is illustrated well by the following model. We consider a model that N issues 
have an equal correlation c: 



C 



(\ c 

c 1 

\c ■■■ 



' ■ c 

c 1/ 



(36) 



C has an eigenvalue 1 + (N — l)c with no degeneracy and an eigenvalue 1 — c with degeneracy N — 1. 
The eigenvalue 1 — c becomes small if c is close to 1 i.e. strong correlation. Its eigenvectors have non-zero 
components at the correlated issues, resulting in a large IPR. 

However this reasoning of large IPR eigenvectors at small eigenvalues is not sufficient to explain two 
facts. Firstly, eigenvectors with large IPR appear only below the bulk of the eigenvalue distribution of 
the Wishart matrix, concentrating at the lower edge. Since the correlation c should be distributed in a 
wide range, eigenvectors with large IPR should also be distributed in a wide range. Thus the absence of 
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small eigenvalues with large IPR within the bulk is puzzling. Secondly, each eigenvector with large IPR is 
observed at a smaller value than expected from the model above. As the largest non-diagonal element of 
the correlation matrix of TSE (S&P) is 0.74 (0.83), eq. I|36l) tells that the eigenvector with large IPR with 
the smallest eigenvalue should be observed at 0.26 (0.18). Actually the smallest eigenvalue with large IPR is 
observed at 0.11 (0.14) which is smaller than the lower bound of the eigenvalue distribution of the Wishart 
matrix. 

These two facts motivate us to study the interplay between deterministic correlations and randomness. We 
consider a model of random walks with a deterministic correlation matrix C, and examine IPR of eigenvectors 
of the cross correlation matrix C. As a simple model, we assume C to have a following form : 



C 



/Ci 

c 2 



V o • 



C L 
1/ 



(37) 



Here Ci (I = 1, • • • , L) and 1 are 



C, = 



/I ci ■ ■■ Cl\ 

ci 1 : 



\ Q • ' • Q 1 / 



/l 
1 



(38) 



\0 ■•■ 1/ 



The form of C assumes L groups of issues with strong correlations. We consider N random walks Xi(t) with 



(Xi(t)Xj(T)) 



CijSt- 



(39) 



and examine their T-step cross correlation matrix 



C ij = ±M t M=^Y j x i {t)x j {t). 



(40) 



We set N = 493 and T — 1847 following our TSE data. We set the number L of strongly correlated 
groups to be 4 and the number of issues M participating each group to be 6. We choose the correlations 
to be ci = 0.8, C2 = 0.6, C3 = 0.4, C4 = 0.2. We performed a Monte Carlo simulation of this model. We 
present IPR of the eigenvalues in FigEJ FigEl shows that eigenvalues with large IPR distribute outside the 
bounds of eigenvalue distribution from randomness as in the real stock data. In this model, there should be 
20 (counting degeneracies) small eigenvalues with large IPR in the simple model above, but the observed 
ones with large IPR only amount to 10. This implies that, when small eigenvalues arising from a strong 
correlation appear within the bounds of the Wishart matrix, IPRs of their eigenvectors get smaller and cannot 
be distinguished from the random eigenvalues. This is one effect of randomness on deterministic correlations. 
We also note that even for the eigenvectors which have larger IPR than the RMT value, their IPRs are 
smaller than expected. 

Moreover C has small eigenvalues 0.2 and 0.4 while the corresponding eigenvalues of C distribute in the 
vicinity of 0.14 and 0.22 respectively. On the other hand, the eigenvalues of C corresponding to the large 
eigenvalues of C are shifted to larger values than the original values. Namely the eigenvalues of C from the 
deterministic correlation are repelled from the distribution of the random eigenvalues. We performed Monte 
Carlo simulations by changing the parameters for C and got similar results. This may be interpreted as a 
manifestation of the universal effect of randomness, called "level repulsion" 01 . According to RMT, the 
eigenvalues of random matrices are repelled from each other by the logarithmic potential in — In | — Xj | in 
(11311 . Even when some deterministic terms are present, this logarithmic potential causes a repulsion between 
eigenvalues. This universal effect has been observed for various systems such as levels of complicated nuclei. 
In the present case, deterministic correlations between random walks are repelled from the bulk distribution 
of the random eigenvalues. The eigenvalues in the RMT bounds form a repulsive potential and it repels the 
eigenvalues outside them. 
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FIG. 6: The upper graph is the IPR of the eigenvectors of the real correlation matrix C given by 137I40H . The lower 
graph is the IPR for the eigenvector of C. In the simulation, we set N = 493, T = 1847, M = 6, L = 4, c\ = 0.8, C2 = 
0.6, c 3 = 0.4 and c 4 = 0.2. 



We can deduce this "level repulsion" by solving the Dyson-type equations I|23I26[I numerically. We assume 
for simplicity that the eigenvalues of C are 1 except one eigenvalue smaller or larger than 1. We solve 
(|23I26|) numerically for N = 293 and T — 1847 and obtain the relation between the smaller (or larger) 
eigenvalue and the corresponding eigenvalue of C. The result is shown in Fig0 Fig[7] shows that smaller 
(larger) eigenvalues of C are repelled by the bulk distribution around 1 and are observed as smaller (larger) 
eigenvalues of C. 

Thus we found two interplays between deterministic correlations and randomness. Namely, when groups 
of issues have strong correlations, it results in large and small eigenvalues in the cross correlation matrix. 
Some of these eigenvalues are soaked up within the RMT bounds and their IPR becomes as small as the 
RMT value. They cannot be distinguished from random eigenvalues. On the other hand, eigenvalues 
from deterministic correlations outside the RMT bounds feel the repulsive potential generated by the bulk 
distribution of randomness. At the lower edge, they are shifted to smaller values. We believe that these give 
the explanation for two deviations we raised in this section. 
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FIG. 7: The effect of level repulsion on the eigenvalues of C . The horizontal axis is the small (large) eigenvalue of C 
and the vertical axis is the corresponding eigenvalue of C. The upper (lower) graph is for the case where eigenvalues 
of C are smaller (larger) than 1. The crosses are the result of a Monte-Carlo simulation based on eqs. 139H and I40H . 
The straight line corresponds to the absence of the effect of randomness, when the eigenvalues of C are identical to 
those of C. The eigenvalues of C are repelled from the bulk vicinity of 1. 



VII. GROUPS OF ISSUES FORMED BY STRONG CORRELATION 

We have seen that the existence of a group of issues with strong correlation results in eigenvalues of the 
cross correlation matrix with large IPR. Conversely, by examining the eigenvectors with large IPR, we may 
identify groups formed by strong correlations. 

For NYSE, Plerou et al Q examined the eigenvectors of large eigenvalues and distinguished strongly 
correlated issues by a criteria to have a large component in these eigenvectors. They found that the groups 
are formed according to the industrial sectors. However we found a difficulty to apply their method to TSE. 
Because eigenvectors for large eigenvalues have significant components not only from correlations but also 
from randomness, even if an issue has a large component in an eigenvector of large eigenvalue, it is difficult to 
tell whether it is from the effect of deterministic correlation or just from randomness. Especially for TSE, the 
effect of deterministic correlations is apparently not strong enough to make the separation straightforward. 
As we examined the eigenvectors of the large eigenvalues, we found it impossible to separate the group of 
strongly correlated issues. For example, Fig. OH shows the component distribution of the eigenvector for the 
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FIG. 8: The component distribution of the eigenvector for the six-th largest eigenvalue 4.0 of TSE. The components 
distribute continuously and it is hard to distinguish the components from correlations. 

six-th largest eigenvalue 4.0 in TSE. One sees that the components have a continuous distribution and it is 
hard to separate large components due to deterministic correlaitons. 

Therefore, here we propose a supplementary method to identify strongly correlated components. As 
we saw in Sec. I VII when a group of issues is formed by strong correlations, they not only have a large 
component in the eigenvectors of the corresponding large eigenvalue, but also have a large component in 
the eigenvectors of the corresponding small eigenvalue. On the other hand, issues which do not have strong 
correlations with others should have the normal distribution in eigenvectors. Namely, the deviation from the 
normal distribution indicates the issue is correlated with others. To quantify how an issue has a distribution 
different from the normal distribution, we define a quantity Zi as follows. 

Zi= u «> ( 41 ) 

where <5 t h is a threshold for IPR. is the sum of the square of i-th component of the eigenvectors which 
have IPR > <5 th . We set <5 th = 0.008(0.02) for TSE (S&P), which sort out 41(28) eigenvectors. If i-th issue 
has no true correlation with others, the components u^i of the eigenvectors follow the normal distribution, 
and hence the probability of having a large Zi should be small. Thus the i-th issue may be regarded as 
significantly correlated if Zi is larger than a certain threshold a t h- We choose ath so that the probability 
of Zi > ath is 1 % if the eigenvector components for the i-th issue follows the normal distribution. For our 
data, ath =0.131 (0.162) for TSE (S&P). If i-th issue has a large component in an eigenvector, we consider 
it to be in the corresponding group of strong correlations when Zi > ath- 

We applied this method to large eigenvalues observed in our market data. The results are shown in 
TABLE 1-2. 

TABLE 1 TSE issues with Z t > a th 



Eigenvector TSE code Company Name Sector 

u 2 6701 NEC Electric Products 

u 2 6702 Fujitsu Electric Products 

U2 8035 Tokyo Electron Electric Products 

i*3 1888 Wakachiku Construction Construction 

i*3 8834 Douwa Real Estate Real Estate 

U4 9501 Tokyo Electric Power Electric Power 

1*4 9503 Kansai Electric Power Electric Power 

i*4 9504 Chuugoku Electric Power Electric Power 

114 9506 Tohoku Electric Power Electric Power 
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9509 


TT 11 '1 T^l j • T 

Hokkaido Electric Power 


u 5 


1888 


TT 7 1 1*1 /~1 1 1 * 

Wakacniku Construction 


u 5 


8834 


Douwa Real Estate 


u 5 


1801 


Taisei Corporation 


u 5 


1804 


Satou Kogyo 


u 5 


1805 


Tobishima Construction 


u 5 


1806 


Fujita Corporation 


u 5 


1886 


Aoki Corporation 


u 5 


8601 


Daiwa Securities 


u 5 


8603 


Nikko Cordial Croup 


u 6 


8834 


Douwa Real Estate 


u 6 


9501 


lokyo Electric Power 


i* 6 


9503 


Kansai Electric Power 


u 6 


9504 


Chuugoku Electric Power 


u 6 


9506 


Tohoku Electric Power 


u 6 


9509 


Hokkaido Electric Power 


us 


1804 


Sato Corporation 


u 6 


1805 


Tobishima Construction 


u 6 


1806 


Fujita 


i* 6 


1886 


Aoki Corporation 


u 7 


9504 


Chuugoku Electric Power 


U-j 


9506 


Tohoku Electric Power 


u 7 


5801 


Furukawa Electric 


u 7 


8004 


Nichimen 




8335 


Ashikaga Bank 




9766 


Konami 


Ug 


8004 


Nichimen 


Ug 


8335 


Ashikaga Bank 




8752 


k_> 1X1111 LU111U 1V±1 Id Lll J. Vdrl J V " 


TABLE 2 


S&P issues 


with Zi > ath 


Eigenvector 


Ticker 


Company Name 


u 2 


AEP 


American Electric Power 


u 2 


DUK 


Duke Energy Corporation 


u 3 


APC 


Anadarko Petroleum Corp 


U3 


BHI 


Baker Hughes Inc. 


U3 


XOM 


Exxon Mobil Corporation 


U3 


HAL 


Halliburton Company 


U3 


RD 


Royal Dutch Petroleum C< 


U3 


SLB 


Schlumberger Ltd. 


u 3 


UCL 


Unocal Corporation 


1*4 


GP 


Georgia-Pacific Group 


1*4 


IP 


International Paper Co. 


1*4 


MEA 


Mead Corporation 


1*4 


WY 


Weyerhaeuser Company 



Electric Power 

Construction 

Real Estate 

Construction 

Construction 

Construction 

Construction 

Construction 

Finance 

Finance 

Real Estate 
Electric Power 
Electric Power 
Electric Power 
Electric Power 
Electric Power 
Construction 
Construction 
Construction 
Construction 

Electric Power 
Electric Power 
Nonferrous Metal 
Wholesale 

Bank 
Service 

Wholesale 

Bank 

Insurance 



Industries 

Electric Power 

Electric Power, Natural Gas 

Oil, Gas 
Oil Related 
Oil, Coal, Copper 
Oil, Gas 

Oil, Gas, Chemical 
Oil 

Oil, Gas 

Paper Manufacture, Pulp 

Paper Manufacture 

Paper Manufacture, Pulp, Gum 

Paper Manufacture , Pulp, Forestry, Wooden Goods 
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U5 


MRK 


Merck & Co., Inc. 


Medicine Manufacture 


U 5 


PFE 


Pfizer Inc. 


Medicine Manufacture 


U5 


SGP 


Schering-Plough Corp. 


Medicine Manufacture 


U 6 


BK 


Bank of New York Co. 


Bank 


"6 


JPM 


J. P. Morgan Chase & Co. 


Finance 


«(> 


PNC 


PNC Financial Services 


Finance 


U (i 


STI 


SunTrust Banks, Inc. 


Bank 


U7 


ABX 


Barrick Gold Corp. 


Gold Mining, Gold Goods 


U 7 


HM 


Homcstake Mining Co. 


Gold Mining 


u 7 


NEM 


Newmont Mining Corp. 


Gold Mining 


117 


PDG 


Placer Dome Inc. 


Gold Mining 


us 


SBC 


SBC Communications Inc. 


lelecommumcation, Cable leievision, Internet 




VZ 


Verizon Communications 


Telecommunication, Internet 


u s 


MU 


Micron Technology, Inc. 


Semiconductor 


Ug 


TXN 


Texas Instruments 


Semiconductor 


Uq 


AMR 


AMR Corporation 


Aviation 


Uq 


DAL 


Delta Air Lines, Inc. 


Aviation 


Uq 


F 


Ford Motor Company 


Automobile 


Uq 


CM 


General Motors Corp. 


Automobile 


u w 


EIX 


Edison International 


Holding Company of Electric Power 




PCG 


PG&E Corporation 


Holding Company of Electric Power 


u n 


AL 


Alcan Inc. 


Aluminium, Aluminium Can 


Mil 


AA 


Alcoa, Inc. 


Aluminium 



In S&P, Electric Power sector and, Oil and Gas related sectors play major parts in the correlations. In 
TSE, Electric Products sector and Construction sector play major parts. 

In S&P, each eigenvector corresponds to an industrial sector. This means that each industrial sector forms 
a strongly correlated group. On the other hand, in TSE, there are eigenvectors whose participants are from 
different industrial sectors, which may indicate a more complicated correlation structure of the market. Thus 
it seems that TSE and S&P (NYSE) have some differences in the structure of the correlations, while the 
"random" part is well described by the universal theory in the both markets. It would be interesting to find 
the origin of the difference. This might be useful to give some insights into the difference of the economic 
structures of the two countries. 

As far as our data samples are concerned, we may conclude that the method which we propose utilizing 
small eigenvectors and their IPR effectively distinguishes strongly correlated groups in the markets. 

We noticed that Giada et al. investigated the grouping of S&P data in Ref. Q| based on a model 
considered by Noh [l^. The method proposed in Refs. [lil Hj| has the advantage of directly giving the 
"noise-undressed" correlation matrix. However, the basic assumption of their method is that each issue 
belongs to only one cluster of correlated issues. This assumption is apparently not quite true according 
to our analysis. For example, "Tohoku Electric Power" appears in three different groups in TABLE 1. 
Therefore we believe that more analysis based on conservative assumptions should be made before applying 
the estimated "true" correlation to the portfolio management. 
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VIII. CONCLUSIONS 



We analyzed the eigenvalues and the eigenvectors of the cross correlation matrices of TSE and NYSE 
(S&P500) stock market data. We found that results of Refs. 0, 0, |(| reported for NYSE are also valid for 
TSE. The eigenvalue distribution obeys the RMT prediction in the bulk but there are some deviations at 
the large eigenvalues. We also examined the nearest neighbor spacing, the next-nearest neighbor spacing 
and the rigidity of the eigenvalues and found that they follow the universality of GOE. These are consistent 
with Refs. and imply that the large eigenvalues are due to the existence of correlations while the 

eigenvalues distributed in the bulk are due to randomness. We also examined IPR of the eigenvectors of the 
correlation matrices. In the bulk, IPR distribution follows the prediction of GOE, but there are deviations 
outside the RMT bounds. Plerou et al. 0, argued that deviations at the lower edge are due to strong 
correlations. We found that this reasoning is qualitatively valid, but quantitatively it cannot explain the 
fact that small eigenvalues with large IPR concentrate at the lower edge and the observed eigenvalues are 
smaller than the expected values. 

To explain this phenomenon, we studied RMT with deterministic correlations. We found that each eigen- 
value from deterministic correlations is observed at values which are repelled from the bulk distribution. 
We interpreted this repulsion as a reminiscent of the effect of randomness, known as "level repulsion". This 
effect is shown to be deduced by solving the Dyson-type equation numerically 

We also proposed a method to distinguish strongly correlated groups of issues based on IPR. It reduces 
the accidental appearance of uncorrelated issues. Applying this method, we found that issues of S&P 
are grouped according to the industrial sectors. On the other hand, issues of TSE are grouped in more 
complicated ways, suggesting some differences in the structure of the markets. 
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